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Abstract 

Background: Somatostatin and its related neuroendocrine peptides have a wide variety of physiological functions 
that are mediated by five somatostatin receptors with gene names SSTR1-5 in mammals. To resolve their evolution 
in vertebrates we have investigated the SSTR genes and a large number of adjacent gene families by phylogeny 
and conserved synteny analyses in a broad range of vertebrate species. 

Results: We find that the SSTRs form two families that belong to distinct paralogons. We observe not only 
chromosomal similarities reflecting the paralogy relationships between the SSTR-bearing chromosome regions, but 
also extensive rearrangements between these regions in teleost fish genomes, including fusions and translocations 
followed by reshuffling through intrachromosomal rearrangements. These events obscure the paralogy relationships 
but are still tractable thanks to the many genomes now available. We have identified a previously unrecognized 
SSTR subtype, SSTR6, previously misidentified as either SSTR1 or SSTR4. 

Conclusions: Two ancestral SSTR-bearing chromosome regions were duplicated in the two basal vertebrate 
tetraploidizations (2R). One of these ancestral SSTR genes generated SSTR2, -3 and -5, the other gave rise to 5577? 7, 
-4 and -6. Subsequently SSTR6 was lost in tetrapods and SSTR4 in teleosts. Our study shows that extensive 
chromosomal rearrangements have taken place between related chromosome regions in teleosts, but that these 
events can be resolved by investigating several distantly related species. 
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Background 

The availability of a large variety of annotated and 
assembled vertebrate genome sequences has made it 
possible to address specific evolutionary questions on a 
genome-wide scale. This includes both large-scale ana- 
lyses of genome evolution [1-5] and targeted compara- 
tive evolutionary studies of specific gene families. The 
Ensembl genome database (www.ensembl.org) includes 
genomes for representatives of most vertebrate classes, 
as well as suitable out-groups for the study of verte- 
brate evolution [6]. The recent addition of the gen- 
omes of the Comoran coelacanth Latimeria chalumnae 
and the spotted gar Lepisosteus oculatus complements 
the previous set of species with pivotal out-groups for 
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tetrapods and teleost fishes, respectively. They are es- 
pecially important for studies of genomic events that 
have taken place in either teleost or tetrapod evolu- 
tion, as is the case for the chromosomal regions 
described in the present study. 

The basal vertebrate whole genome duplications (2R) 
[1,3,4] and subsequently the teleost-specific genome du- 
plication (3R) [2,7] have expanded numerous endocrine 
and neuronal gene families, see for example references 
[8-15]. Here we have subjected the chromosomal regions 
harboring the somatostatin receptor family genes to a 
detailed analysis by collecting sequences from a broad 
range of vertebrate genomes, including several teleost 
fishes as well as the spotted gar and the coelacanth. 

Somatostatin, the short peptide responsible for inhib- 
ition of growth hormone release, was sequenced from 
sheep hypothalamus in 1973 [16] and its discovery was 
one of the achievements highlighted by the 1977 Nobel 
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Prize in physiology or medicine. Subsequently this 14- 
amino-acid peptide was sequenced in numerous other 
vertebrate species and was found to be highly conserved 
during evolution. Somatostatin is widely distributed and 
serves both as a neuroendocrine peptide regulating the 
pituitary, a neuropeptide acting on other neurons, and 
as an endocrine peptide. In accordance with this, som- 
atostatin has been reported to have many physiological 
effects [17]. A somatostatin-related peptide was discov- 
ered in mouse and human and was named cortistatin or 
somatostatin-2 [18]. It is now known to be present 
throughout the tetrapods. In teleost fishes additional 
somatostatin-like peptides exist named somatostatin 
3-6, each encoded by a separate gene [19]. All of 
these duplicates may have arisen through chromosome 
duplications in 2R and 3R [19,20]. 

After the first identification of binding sites for som- 
atostatin, evidence began to accumulate for more than 
one receptor subtype. The cloning era of G-protein- 
coupled receptors led to the discovery of five somato- 
statin receptor subtypes in mammals, named SSTR1 
through 5 [21]. The conserved structure of somatostatin 
receptor genes consists of a single exon encoding pro- 
tein products of approximately 360 to 420 amino acids. 
The somatostatin receptors have been classified into two 
subfamilies based upon their degree of sequence identity: 
The human SSTR1 and SSTR4 amino acid sequences 
share 70% sequence identity in the region spanning 
TM1 to TM7 (including the loops), while SSTR2, -3 and 
-5 share 56-66% amino acid sequence identity to each 
other. All five receptor subtypes inhibit adenylyl cyclases 
[22] and they can also trigger other second messenger 
pathways to various extents. 

Homologs of the mammalian somatostatin receptors 
have been described in several teleost fishes, see Nelson 
& Sheridan (2005) [23] for review. However, no SSTR4 
subtype has yet been described in a teleost fish. The 
known SSTR repertoire in chicken is the same as in 
mammals and several of the receptors have been studied 
functionally [24,25]. It was proposed several years ago 
that the SSTR family expanded in 2R [21] although it 
was not clear how the appearance of the five members 
correlated with the two genome doublings. A more re- 
cent phylogenetic analysis [26] presented a tree that was 
unresolved both with respect to species taxonomy and 
somatostatin receptor subtypes. Other investigators have 
proposed that the SSTRs arose from a series of duplica- 
tions throughout vertebrate evolution [27,28] . 

Our analyses allow us to conclude that the chromo- 
some duplications in early vertebrate evolution (2R), and 
in the teleost tetraploidization (3R), can explain the 
known repertoire of vertebrate somatostatin receptors. 
Furthermore, we have discovered that one of the teleost 
receptors represents a sixth ancestral vertebrate subtype 



that we have called SSTR6, which is still present in 
some teleost fishes, the spotted gar and the coelacanth, 
but has been lost in tetrapods. Thus, the somatostatin 
receptor system obtained its present complexity already 
in the early stages of vertebrate evolution. By centering 
our analyses around the SSTR genes we could also 
disentangle complex rearrangements in the SSTR- 
bearing chromosome regions in teleost fish genomes. 
This has implications for analyses of conserved synteny 
and the assignment of orthology for genes located in 
these regions. 

Results 

Phylogenetic analysis of the SSTR gene family; 
identification of a sixth SSTR subtype 

Somatostatin receptor amino acid sequences were col- 
lected from genome databases for several species repre- 
senting most of the vertebrate classes: In addition to 
tetrapod and teleost fish genomes, the genomes of the 
Comoran coelacanth (Latimeria chalumnae) and the 
spotted gar (Lepisosteus oculatus) were investigated in 
order to provide relative dating points earlier in the evo- 
lution of lobe-finned fishes (Sarcopterygii) and ray- 
finned fishes (Actinopterygii), respectively. The identified 
amino acid sequences include predictions from several 
previously unknown SSTR sequences. These results are 
summarized in Table 1, and detailed descriptions of the 
identified sequences are included as Supplemental note 1 
(see Additional file 1). 

The SSTR amino acid sequences identified in the gen- 
ome databases were used to create an alignment for 
phylogenetic analyses in order to determine the identity 
of previously unknown SSTR sequences and study the 
evolution of this gene family. Using the human 
kisspeptin-1 receptor as out-group, the resulting phylo- 
genetic maximum likelihood (PhyML) tree in Figure 1 
shows that the vertebrate SSTR family consists of six 
subtype clusters representing the five known SSTR sub- 
types SSTR1 through SSTR5, as well as a previously 
unrecognized sixth subtype. We have named these 
sequences SSTR6 in our studies. In agreement with pre- 
vious analyses of fewer sequences [21,23,27], the tree has 
two well-defined ancestral branches; one including 
SSTR2, -3 and -5, and one containing the SSTR1 and -4 
as well as the SSTR6 subtype. Both branches are 
well-supported, and the separate SSTR subtypes form 
well-supported clusters within each branch, using 
both bootstrapping and SH-like approximate likelihood 
ratio statistics (see Additional file 2, Figures SI and S2). 
Some subtypes are missing from some species' genome 
databases (see Additional file 1, Supplemental note 1). 
Notably, sequences of the sixth subtype, SSTR6, could 
not be identified in any of the investigated tetrapod 
sequences, and SSTR4 sequences could not be identified 
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Table 1 Summary of the identified somatostatin receptor sequences analyzed in this study 


Genus and species 
(genome assembly version) 


Assigned 
sequence names 


Chromosome/linkage group / 
genomic scaffold locations 


Mammals Homo sapiens 


Human SSTR1 


14: 38.68 Mb 


(GRCh37) 


Human SSTR2 


17: 71.16 Mb 




Human SSTR3 


22: 37.60 Mb 




Human SSTR4 


20: 23.02 Mb 




Human SSTR5 


16: 1.12 Mb 


Mus musculus 


Mouse SSTR1 


12: 59.31 Mb 


(NCBIM37) 


Mouse SSTR2 


11: 113.48 Mb 




Mouse SSTR3 


15: 78.37 Mb 




Mouse SSTR4 


2: 148.22 Mb 




Mouse SSTR5 


17: 25.63 Mb 


Canis familiaris 


Dog SSTR1 


8: 19.58 Mb 


(BR0AD2) 


Dog SS77?2 


9: 10.00 Mb 




Dog SS7R3 


10: 30.40 Mb 




Dog SSTR5 


6: 42.65 Mb 


Monodelphis domestico 


Opossum SSTR1 


1 : 278.65 Mb 


(BR0AD05) 


Opossum SSTR2 


2: 217.49 Mb 




Opossum SSTR3 


8: 91.98 Mb 




Opossum SSTR4 


1: 598.18 Mb 




Opossum SSTR5 


6: 153.47 Kb 


Birds Gallus gallus 


Chicken SSTR1 


5: 39.75 Mb 


(WASHUC2) 


Chicken SSTR2 


18: 9.00 Mb 




Chicken SSTR3 


1: 53.39 Mb 




Chicken SS77?4 


3: 3.27 Mb 




Chicken SSTR5 


14: 5.64 Mb 


Reptiles Anolis corolinensis 


Anole lizard SS77?7 


a 


(AnoCar2.0) 


Anole lizard SSTR2 


2: 96.75 Mb 




Anole lizard SSTR3 


5: 22.84 Mb 




Anole lizard SS77?5 


GL343263.1: 1.76 Mb 


Amphibians Xenopus tropicolis 


Frog SSTR1 


GL1 72781.1: 1.07 Mb 


(JGL4.2) 


Frog SS7K2 


GL1 7281 2.1: 1.79 Mb 




Frog SS77?3 


GL1 72724.1: 1.41 Mb 




Frog SSTR4 


GL1 72884.1: 512.43 Kb 




Frog SS77?5 


GL1 72659.1: 446.1 7 Kb 


Coelacanth Lotimerio cholumnoe 


Coelacanth SSTR1 


JH1 26598.1: 0.53 Mb 


(LatChaD 


Coelacanth SSTR2 


JH1 26581.1: 3.45 Mb 




Coelacanth SSTR3 


JH1 29649.1: 0.21 Mb 




Coelacanth SSTR4 


JH1 26648.1: 2.61 Mb 




Coelacanth SSTR5 


JH1 29247.1: 0.21 Mb 




Coelacanth SSTR6 


JH1 27490.1: 0.26 Mb 




Coelacanth SSTRX 


JH1 26581.1: 3.47 Mb 


Spotted gar Lepisosteus oculotus 


Spotted gar SSTR1 


LG7: 4.44 Mb 


(LepOcuD 


Spotted gar SSTR2 


LG10: 34.84 Mb 




Spotted gar SSTR3 


LG12: 34.19 Mb 




Spotted gar SSTR5 


LG13: 4.69 Mb 




Spotted gar SSTR6 


LG28: 1.08 Mb 
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Table 1 Summary of the identified somatostatin receptor sequences analyzed in this study (Continued) 

Teleost fish 



Danio rerio 


Zebrafish SSTR1 


17: 10.35 Mb 


(Zv9) 


Zebrafish SSTR2a 


3: 63.08 Mb 




Zebrafish SSTR2b 


12: 1.73 Mb 




Zebrafish SSTR3o 


3: 29.75 Mb 




Zebrafish SSTR3b 


Scaffold Zv9_NA631: 3.42 Kb 




Zebrafish SSTR5o 


24: 16.78 Mb 




Zebrafish SSTR5b 


1: 55.01 Mb 




Zebrafish SSTR6 


7: 19.63 Mb 


Gosterosteus oculeotus 


Stickleback SSTR2o 


groupXI: 9.50 Mb 


(BROADS 1) 


Stickleback SSTR2b 


groupV: 6.81 Mb 




Stickleback SSTR3a 


groupXI: 15.59 Mb 




Stickleback SSTR5o 


groupXI: 11.72 Mb 




Stickleback SSTR5b 


grouplX: 14.95 Mb 




Stickleback SSTR6 


scaffold_47: 436.21 Kb 


Oryzias latipes 


Medaka SSTR2a 


8: 10.93 Mb 


(MEDAKA1) 


Medaka SSTR2b 


scaffold5841: 160 bp 




Medaka SSTR3a 


8: 2.80 Mb 




Medaka SSTR3b 


1: 29.10 Mb 




Medaka SSTR5o 


8: 13.75 Mb 


Tetraodon nigroviridis 


Green puffer SSTR2a 


3: 10.44 Mb 


(TETRA0D0N8) 


Green puffer SSTR2b 


2: 4.83 Mb 




Green puffer SSTR3o 


3: 15.06 Mb 




Green puffer SSTR3b 


18: 10.39 Mb 




Green puffer SSTR3c 


Un_random: 59.49 Mb 




Green puffer SSTR5b 


18: 2.40 Mb 


Tnkifi inn n ihrinps 

IUI\IIUUU lUUIIfJCj 


Fi mi i ^TR?n 
ruyu j j / nzu 


<;rpffnlrl 1 1 S- 41 1 Kh 

oLCIMUlU I I J. *+ I I ,JU l\U 


(FUGU4) 


Fugu SSTR2b 


scaffold_3: 3.77 Kb 




Fugu SSTR3a 


scaffold_359: 200.56 Kb 




Fugu SSTR3b 


scaffold_407: 33.36 Kb 




Fugu SSTR5b 


scaffold_189: 267.65 Kb 




Fugu SSTR6 


scaffold_164: 38.74 Kb 


Drosophila melonogoster 


Fruit fly Drostorl 


3L: 18.55 Mb 


(BDGP5) 


Fruit fly Drostor2 


3L: 18.48 Mb 



Invertebrates 



a The Anole lizard S577?7 sequence could not be identified in the most updated assembly (AnoCar2.0), however it is located on genomic scaffold_0 at 284.26 Kb in 
the previous assembly (AnoCarl.O, Ensembl database version 60). 



in teleost fishes or in the spotted gar. All six SSTR sub- 
types are represented in the coelacanth, demonstrating 
that the absence of SSTR4 genes in the spotted gar and 
teleost fishes, and of SSTR6 genes in tetrapods likely 
resulted from secondary gene losses. In teleost fishes, an 
SSTR1 sequence could only be identified in the zebrafish 
genome. 

There are teleost specific duplicates of SSTR2, -3 and -5 
forming well-supported a- and ^-clusters within their re- 
spective subtypes. In the spotted gar genome only single 
copies of the SSTR2, -3 and -5 sequences were found, and 
these branch basal to the respective teleost-specific a- and 



^-duplicate clusters, which strongly supports the duplica- 
tion of SSTR2, -3 and -5 early in the teleost lineage. Taken 
together this means that some teleost species may have 
up to eight different SSTR family members, including an 
SSTR subtype that has not been previously described. In 
our analyses, the zebrafish genome has this repertoire of 
receptors: SSTR1, -2a, -2b, -3a, -3b, -5a, -5b and -6. 

The known Drosophila allatostatin C receptor 1 and 2 
sequences called Drostarl and Drostar2 were included 
in the phylogenetic analyses due to their close sequence 
and functional similarity with the mammalian somato- 
statin receptor [29]. These sequences cluster together 
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SSTR1 



SSTR2 




SSTR4 



Coelacanth SSTRX 



Frog SSTR3 
Anole lizard SSTR3 
Chicken SSTR3 
Opossum SSTR3 
Mouse SSTR3 
Human SSTR3 
Dog SSTR3 
Coelacanth SSTR3 

Spotted gar SSTR3 

Zebrafish SSTR3b 



SSTR3 



■ Medaka SSTR3b 

\ r Fugu SSTR3b 

I Green puffer SSTR3b 

Zebrafish SSTR3a 
Medaka SSTR3a 
Stickleback SSTR3a 
Fugu SSTR3a 
Green puffer SSTR3c 
Green puffer SSTR3a 



SSTR5 



► Nodes supported by less than 50% 
of bootstrap replicates. 



Zebrafish SSTR5b 

Stickleback SSTR5b 
Fugu SSTR5b 
Green puffer SSTR5b 



Stickleback SSTR5a 



Medaka SSTR5a 



Figure 1 Phylogenetic maximum likelihood tree of the somatostatin receptor gene family. The topology is supported by a non-parametric 
bootstrap test with 100 replicates as well as an SH-like approximate likelihood ratio test (aLRT). The tree is rooted with the human kisspeptin 
receptor 1 sequence (not shown). Branch support (bootstrap replicates) for deep divergences is shown at the nodes. All branch support values 
are shown in Figure SI (bootstrap replicates) and Figure S2 (aLRT) (see Additional file 2). The phylogenetic tree shows six 
well-supported subtype clusters, with the somatostatin receptor subtypes SSTR2, -3 and -5 forming one ancestral branch and the SSTR1, -4 and -6 
receptor subtypes forming one ancestral branch. This phylogenetic analysis supports the emergence of all six subtypes early in vertebrate 
evolution, with the subsequent loss of SSTR4 in ray-finned fishes, before the divergence of the spotted gar and teleost lineages, and of SSTR6 in 
the tetrapod lineage. All six subtypes could be identified in the coelacanth genome. A seventh SSTR2-\\ke sequence, called SSTRX in the tree, 
could also be identified on the same genomic scaffold in the coelacanth genome (see Additional file 1, Supplemental note 1). There are 
well-supported teleost-specific duplicate branches of SSTR2, -3 and -5, although all could not be identified in all teleost genomes. These 
duplicates have been named a and b based on the phylogenetic analysis. There is a third SSTR3 sequence in the green puffer, called SSTR3c in 
the tree. 



basal to the vertebrate SSTR sequences and most 
probably represent an independent duplication event. 
It was not possible to identify true SSTR orthologs in 
the tunicates Ciona intestinalis and Ciona savignyU or 
in the Florida lancelet (amphioxus) Branchiostoma 
floridae. 



Syntenic gene families 

In addition to making a phylogenetic tree of somato- 
statin receptors in vertebrates, our aim was to determine 
whether the SSTR genes were duplicated in the chromo- 
some doublings in 2R. To test this hypothesis, syntenic 
(neighboring) gene families in the SSTR gene-bearing 
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chromosome regions were analyzed with respect to their 
phylogenies, using both neighbor joining (NJ) and 
PhyML methods, and the chromosomal locations of the 
member genes (see Methods below). In total, 47 syntenic 
gene families were analyzed. Our results of the con- 
served synteny analyses are presented as tables compar- 
ing the chromosomal locations of all the identified 
syntenic family member genes in the genomes of human, 
chicken, zebrafish, stickleback and medaka. Due to size 
restrictions, the tables have been included as additional 
data files (see Additional files 3 and 4). The phylogenetic 
trees of all the neighboring gene families have also been 
included as additional files (see Additional files 5 and 6). 
These tables and phylogenetic trees are the bases for our 
description of the results below. 

Conserved synteny analysis of the SSTR1, -4 and -6 
chromosome regions 

The chromosomal locations of the SSTR genes as well 
as the early divergence of two ancestral SSTR branches 
in the phylogenetic tree suggested that the SSTR1, -4 
and -6 genes derive from one ancestral SSTR gene, and 
the SSTR2, -3 and -5 genes from a separate ancestral 
SSTR gene, and that these two ancestral genes were 
located in distinct paralogons (related chromosome 
groups). Therefore two separate analyses of conserved 
synteny were done. To investigate whether the SSTR1, -4 
and -6 genes arose by duplications of a single ancestral 
gene in 2R we have carried out phylogenetic analyses of 
17 syntenic gene families and the chromosomal locations 
of all neighboring family members were noted and com- 
pared between species (see Additional file 3). In sum- 
mary, all but two of the 17 identified syntenic gene 
families in the SSTR1, -4 and -6 chromosome blocks 
(Table 2) have phylogenetic trees that either support or 
are consistent with duplications early in vertebrate evo- 
lution (see Additional file 5). These gene families have 
tree topologies with subtype clusters diverging in the 
same time window as 2R, i.e., after the divergence of in- 
vertebrate chordates and vertebrates but before the di- 
vergence of lobe-finned fishes (including tetrapods) and 
ray- finned fishes (including teleosts). The PhyML top- 
ologies of the RIN and PYG gene families are shown as 
examples in Figure 2. Several gene families also have 
teleost-specific duplicate clusters, supporting subsequent 
duplications in 3R, see for example PYGM and RIN2 
clusters in Figure 2 as well as the teleost FLRT1 ortho- 
logs (see Additional file 2, Figure S4). Some of the neigh- 
boring families have inconsistencies between the NJ and 
PhyML trees (see Additional file 1, Supplemental note 2), 
however they were considered supportive if they showed 
the topology described above for at least one of the 
methods. 



Table 2 Neighboring gene families analyzed for 
conserved synteny in the SSTR1, -4 and -6-bearing 
chromosome blocks 



Symbol a 


Description a 


Root (if other than 
D. melanogaster) 


ABHD12 


Abhydrolose domain 

LUl ILUIl III ILj 1 Z 




CFL 


L.U////M Ul IU Ucju/M 

(actin depolymerizing factor) 




FLRT 


Fihrnnprfin \p\ irinp rirh 

1 IUI Ul ICLLII 1 ICUL.IIIC IIL.II 

transmembrane protein 


L.. juviyi lyi 


FOXA 


Fnrkhpnri hnv A 




I cr iv yi 
I jIVi 


Iblllilllll ilUillUIUy 


L.. If ILcbLIf lUllb 


JAG 


Innnpri 




NIN 


Ninein (GSK3B interacting protein) 




NKX2 


NK2 homeobox 1 and 4 




PAX 


Paired box 1 and 9 




PYG 


Glycogen phosphorylase; 
brain, liver and muscle variants 




RALGAPA 


Ral GTPase activating protein, 
alpha subunit 




RIN 


Ras and Rab inter actor 




SEC23 


Sec23 homologs A and B 




SLC24A 


Solute carrier family 
24 members 3 and 4 


B. floridae 


SNX 


Sorting nexin 5, 6 and 32 




SPTLC 


Serine palmitoyltransferase, 
long chain base subunit 2 and 3 




VSX 


Visual system homeobox 


C. elegans 



a Gene family names and descriptions are based on approved HUGO Gene 
Nomenclature Committee (HGNC) gene symbols and descriptions, or known 
aliases from the NCBI Entrez Gene database. Where not all known protein 
subtypes/isoforms are part of the gene family, the included subtypes are 
specified. 



The positional and phylogenetic data combined dem- 
onstrate the conserved synteny between the chromosome 
blocks containing SSTR1, -4 and -6 in the analyzed 
genomes. In the human genome, these correspond to 
well-defined regions on chromosomes 14 and 20, and to 
a lesser degree 11 and 19. Although no «S«S77?6-bearing 
chromosomal region was used in the selection of syn- 
tenic gene families, the ISM (Figure S8), PYG (Figure 
S13), RIN (Figure S15) and SLC24A (Figure S17) families 
have members neighboring the SSTR6 genes in one or 
several of the teleost genomes (see Additional file 3). This 
dataset also shows that rearrangements between the 
homologous chromosome regions have been common in 
the teleost lineage. For example, genes located on both 
chromosomes 14 and 20 in the human genome have 
orthologs that are distributed primarily between chromo- 
somes 13, 17 and 20 in the zebrafish genome in a way 
that suggests the substantive exchange of paralogs be- 
tween these regions. This can be seen for the teleost 
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Table 3 Neighboring gene families analyzed for 
conserved synteny in the SSTR2, -3 and -5-bearing 
chromosome blocks 



Symbol a 


Description a 


Root (if other than 
D. melanogaster) 


ADAP 


ArfGAP with dual PH domains 




ATP 2 A 


ATPase, Ca++ transporting, 
cardiac muscle, fast twitch 




C1QTNF 


Clq and tumor necrosis 
factor related protein 


B. floridae 


CABP 


Calcium binding protein 1, 3, 4 and 5 




CACNA1 


Calcium channel, voltage dependent, 
T type alpha subunit 




CREBBP 


CREB binding protein 




CYTH 


Cytohesin 




FAM20 


Family with sequence similarity 20 




FNG 


Fringe homolog b 




FSCN 


Fascin homolog 1 and 2, 
actin-bundling protein 




GGA 


Golgi-associated, gamma adapting 
ear containing, ARF-binding protein 




GLPR 


Glucagon, glucagon-like and gastric 
inhibitory polypeptide receptors 


C intestinal is 


GRIN2 


Glutamate receptor, ionotropic, 
N-methyl D-aspartate 2 




KCNJ 


Potassium inwardly-rectifying channel, 
subfamily J member 2, 4, 12 and 14 


C intestinal is 


KCTD 


Potassium channel tetramerisation 
domain containing 2, 5 and 17 




METRN 


Meteorin, glial cell differentiation 
regulator 


B. floridae 


NDE 


nudE nuclear distribution 
gene E homolog 




RAB11FIP 


RAB11 family interacting 
protein 3 and 4 (class II) 




RADIL 


Ras association and DIL 
domains/Ras interacting protein 


B. floridae 


RHBDF 


Rhomboid 5 homolog 




RHOT 


Ras homolog gene family, 
member Tl and T2 




RPH3A 


Rabphilin 3A homolog/double 
C2-like domains, alpha 




SDK 


Sidekick cell adhesion molecule 


C elegans 


SOX 


Sex-determining region Y-box 8, 9 and 10 




TEX2 


Testis expressed 2 




TNRC6 


Trinucleotide repeat containing 6 




T0M1 


Target of myb 1 




^YH 


Tweety homolog 




USP 


Ubiquitin specific peptidase 3 1 and 43 




WFIKKN 


WAP, follistatin/kazal, immunoglobulin, 
kunitz and netrin domain contaning 


B. floridae 



a Gene family names and descriptions follow the same system as Table 2. 
b Complete description: Lunatic, manic and radical fringe homolog. 
O-fucosylpeptide 3-beta-N-acetylglucosaminyltransferase. 



orthologs of the PYGL and PYGB genes compared with 
the RIN3 and RIN2 orthologs (Figure 2). In the stickle- 
back and medaka genomes there seems to have been 
fewer rearrangements: orthologs of genes located on 
human chromosome 20 are located on stickleback link- 
age group XV between approximately 3.75 and 4 Mb, 
and on medaka chromosome 22 between approximately 
16 and 16.34 Mb, which suggests translocation of small 
chromosomal blocks (see Additional file 3). 

Conserved synteny analysis of the SSTR2, -3 and -5 
chromosome regions 

For the investigation of paralogy relationships between the 
chromosomal regions that harbor the SSTR2, -3 and -5 
genes, 30 syntenic gene families were analyzed as 
described above for the SSTR1, -4 and -6-bearing chromo- 
some regions. To identify these gene families, the SSTR2, 
-3 and -5-bearing chromosome regions in the chicken and 
stickleback genomes were analyzed for conserved synteny 
(see Methods below). Two separate starting points 
(chicken and stickleback) were used because the chromo- 
somal locations of the SSTR genes in the teleost genomes, 
with SSTR2, -3 and -5 homologs located on the same 
chromosome, suggest a different expansion scenario than 
the tetrapod genomes, including chicken (Table 1). In this 
way we could collect a dataset of neighboring gene fam- 
ilies without favoring one scenario over the other. In sum- 
mary, 23 of the 30 syntenic gene families in the SSTR2, -3 
and -5 chromosome blocks have tree topologies that sup- 
port an expansion from one ancestral vertebrate gene 
through 2R (see Additional file 6, Figure S21-S50). Four 
are consistent with 2R, but show some inconsistencies be- 
tween phylogenetic methods (see Additional file 1, Supple- 
mental note 3) - ADAP (Figure S21), FAM20 (Figure S28), 
RPH3A and TOM1 (Figure S47) - while only two are con- 
sidered inconclusive - CABP (Figure S24) and GGA 
(Figure S31) (see Additional file 6). Many of the analyzed 
families also show topologies that support the duplication 
of family members in 3R. The PhyML topologies of the 
GRIN2 family (Figure 3) and of the FNG and FSCN fam- 
ilies (Figure 4) are shown as examples. 

As described previously, the chromosomal locations of 
all neighboring family members were compared between 
species (see Additional file 4) and the phylogenetic tree 
topologies of the neighboring gene families were used to 
infer the paralogy and orthology relationships. The iden- 
tified conserved synteny blocks correspond to regions of 
human chromosomes 7, 16, 17, 19 and 22 in the human 
genome. This dataset shows that there have been exten- 
sive chromosome rearrangements between the paralo- 
gous chromosome regions in the teleost genomes, and 
to some extent in the human genome. For example, 
many gene families with members on chicken chromo- 
some 14 have orthologs distributed between human 
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Figure 2 Phylogenetic maximum likelihood trees of the PYG and RIN gene families. The glycogen phosphorylase (PYG) and Ras and Rab 

interactor (RIN) gene families are neighboring families of the 5577? 7, -4 and -6 chromosomal regions. Monophyletic subtype clusters including 
both tetrapod and teleost sequences are indicated by bars to the right. Chromosomal or genomic scaffold assignments of the family members 
are indicated next to species names. Lowercase a and b are used to distinguish sequences located on the same chromosomes. Branch support 
values (bootstrap replicates) for deep divergences are shown at the nodes. The trees were rooted with the identified fruit fly sequences. All 
neighboring gene family trees for the 557/? 7, -4 and -6-bearing regions, including NJ analyses and all branch support values, are shown in Figures 
S4-S20 (see Additional file 5). 



chromosomes 16 and 7, and stickleback linkage groups 
V, IX and XL In the zebrafish genome these same gene 
families have orthologs spread over more chromosomes: 
most are on chromosomes 3, 12, 1 and 24, but there are 
individual orthologs of two families on chromosome 22 
and an unmapped genomic scaffold (see Additional file 4). 
Several of these rearrangements can be seen for members 
of the GRIN2 (Figure 3), FNG and FSCN (Figure 4) fam- 
ilies, with teleost-specific duplicates in different subtype 
clusters co-located on the same chromosomes. The human 
GRIN2B sequence also seems to have translocated to 
chromosome 12. The GRIN2A, -2B and -2C clusters show 
well-supported teleost duplicate branches, supporting a du- 
plication in 3R (Figure 3). In these branches we observe tele- 
ost genes located on the same chromosomes (for instance 
zebrafish GRIN2A, -2B and -2C orthologs, all on chromo- 
some 3), likely due to the chromosomal rearrangements. 
The FSCN gene family has several teleost sequences located 
on the same chromosomes, for instance on zebrafish 
chromosome 3, medaka chromosome 8 and stickleback 
linkage group XI, and the FNG family has teleost duplicates 
in the LFNG cluster (Figure 4). However, the topology is not 
clear for the LFNG teleost duplicates. These rearrangements 
in the teleost lineage likely explain why the SSTR2a, -3a and 



-5a genes also are located in the same chromosomal regions 
in teleost genomes (Table 1), as will be discussed below. 

A few gene families identified in the analysis of con- 
served synteny, namely ATP2A (Figure S22), CABP 
(Figure S24), GLPR (Figure S32) and RPH3A (Figure 
S42) (see Additional file 6), have individual paralogs 
on different chromosomes or genomic scaffolds, as des- 
cribed in Supplemental note 3 (see Additional file 1). 

Discussion 

Evolution of the SSTR family 

Our phylogenetic analyses of the SSTR gene family pro- 
vide strong support for expansion and diversification in 
both the 2R and 3R events, giving rise to six different 
SSTR subtype genes early in vertebrate evolution and 
subsequently expanding the SSTR2, -3 and -5 branch in 
the teleost lineage. Our evolutionary scheme of the 
SSTR gene family expansion is presented in Figure 5. 
The sixth subtype, which we have called SSTR6, was pre- 
viously unrecognized. We have identified it in the ray- 
finned fishes, including the spotted gar and the teleosts, 
as well as in the coelacanth, a member of the lobe- 
finned fish lineage. Thus, it was clearly present before 
the divergence of lobe-finned and ray-finned fishes. Its 
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Figure 3 Phylogenetic maximum likelihood tree of the GRIN2 
gene family. The ionotropic glutamate receptor 2 (GRIN2) gene 
family is a neighboring family of the SSTR2, -3 and -5 chromosomal 
regions. Phylogenetic methods, monophyletic clusters and leaf 
names as in Figure 2. 



chromosomal position in the teleosts supports origin in 
2R (see below). Conversely, the SSTR4 gene was only 
identified in the lobe-finned fishes, including tetrapods 
and the coelacanth. These losses are likely the result of 
secondary and independent events: SSTR6 from the 
lineage leading to tetrapods some time after the diver- 
gence of the coelacanth lineage, and SSTR4 from the 
ray-finned fish before the divergence of the spotted gar 
and the lineage leading to teleosts (Figure 5). The top- 
ology of the SSTR1, -4 and -6 branch supports this sce- 
nario (Figure 1). All six SSTR subtypes that emerged 
early in vertebrate evolution are represented in the gen- 
ome of the coelacanth. The additional seventh coela- 
canth sequence that we have called SSTRX is located on 
the same genomic scaffold with the same orientation as 
the SSTR2 sequence (Table 1) and it clusters in the most 
basal position in the SSTR2 cluster. This, together with 
its branch length in the tree, indicates that it is a 
lineage- specific duplicate of SSTR2 with a higher evolu- 
tionary rate. 

The somatostatin system has been reported to have 
arisen prior to the divergence of insects and verte- 
brates, i.e., before the protostome-deuterostome split. 



Drosophila melanogaster and other insects have a 
somatostatin-like 15-amino-acid peptide that has been 
named ASTC for allatostatin C [30]. Two ASTC 
receptors were identified in D. melanogaster, with clos- 
est relationship to human somatostatin and opioid 
receptors [29]. The receptors were named Drostarl 
and -2 and seem to have arisen through a lineage- 
specific duplication in insects. We propose that two 
ancient SSTR genes were present before the emergence 
of vertebrates based on our comparative analyses. 
However, we were unable to identify any unambiguous 
SSTR family members in the genome databases of the 
amphioxus Branchiostoma floridae, and the tunicates 
Ciona intestinalis and Ciona savignyi. The latter are 
members of the urochordate lineage which constitutes 
the closest extant relatives of vertebrates [31,32]. A 
previous analysis of G-protein coupled receptor 
sequences in the Florida lancelet (Branchiostoma flori- 
dae) genome identified several lancelet-specific expan- 
sions of somatostatin-, galanin- and opioid receptor- 
like sequences, totaling 90 distinct sequences in this 
cluster [33]. Among these sequences, 18 cluster to- 
gether with the human SSTR sequences, although the 
resolution of this phylogenetic analysis is very low. In 
any case, these lineage-specific duplications preclude 
the identification of true orthologs to the vertebrate 
somatostatin receptors, although there are several can- 
didates. We have identified three putative somatostatin 
receptor sequences in the genome of the sea lamprey 
Petromyzon marinus, and their database identifiers are 
noted in Table SI (see Additional file 7). However, due 
to the incomplete status of this genome assembly, and 
thus the lack of synteny data, we refrain from specu- 
lating about their orthology relationships. 

Somatostatin receptors have been described for several 
teleost fish species in addition to the ones that we have 
studied. In each species usually one or a few sequences 
have been reported, except for goldfish, Carassius aura- 
tus, where eight sequences have been published [34-37]. 
Our additional tree presented in Figure S3 (see 
Additional file 2) confirms previous suggestions [23,28] 
that two of these correspond to SSTR1 as a result of the 
goldfish-specific fourth tetraploidization (4R) that took 
place some 12-15 MYA [38,39]. Other goldfish 
sequences correspond to subtypes SSTR2, SSTR3a and 
SSTR3b. The three SST7?5-like sequences in goldfish 
were initially named 5a, 5b, and 5c. The one named 5c 
is orthologous to 5b in our comparisons and the ones 
named 5a and 5b appear to be 4R duplicates of 5a (see 
Additional file 2, Figure S3). The latter have accumu- 
lated as many as 66 amino acid differences in this short 
time period (resulting in 83% sequence identity), 
whereas the SSTR1 4R-generated duplicates differ at only 
5 positions. In the orange-spotted grouper, Epinephelus 
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Figure 4 Phylogenetic maximum likelihood trees of the FNG and FSCN gene families. The fringe homolog (FNG) and fascin homolog 1 
and 2 (FSCN) gene families are neighboring gene families of the SSTR2, -3 and -5 chromosomal regions. Phylogenetic methods, monophyletic 
clusters and leaf names as in Figure 2. All neighboring gene family trees for the SSTR2, -3 and -5-bearing regions, including NJ analyses and all 
branch support values, are shown in Figure S21-S50 (see Additional file 6). 



coioides, four sequences have been reported [27] that we 
can now identify as SSTR1, SSTR2b, SSTR3a, and 
SSTRSa (see Additional file 2, Figure S3). The SSTR3 se- 
quence determined in the black ghost knifefish Aptero- 
notus albifrons [40], an electric fish, is SSTR3b, and the 
two sequences from the cichlid Astatotilapia burtoni 
[41] are SSTR2a and SSTR3a. In the rainbow trout, 
Oncorhynchus mykiss, three sequences have been 
reported [42,43]. In our additional analysis the two 
sequences identified as SSTRla and -lb [43,44] can be 
correctly identified as two copies of SSTR6 likely 
resulting from the salmonid fourth tetraploidization 
(see Additional file 2, Figure S3). 

SSTR-bearing chromosome regions were duplicated in 
vertebrate whole genome duplications 

Two separate analyses of conserved synteny were carried 
out in order to test the hypothesis that each of the SSTR1, 
-4, -6 and SSTR2, -3, -5-branches of the SSTR gene family 
was multiplied as a result of duplications of two distinct 
chromosome regions. In total we have compared the 
chromosomal locations of genes in 47 gene families located 
in SSTR-bearing chromosome regions. These positional 
data were combined with phylogenetic analyses of the gene 
families to infer the likely orthology and paralogy relation- 
ships within each family, as well as to determine the time 



window of the duplications and chromosome rearrange- 
ments. As a whole, these analyses show that the SSTR1, -4 
and -6-bearing chromosome regions on the one hand, and 
the SSTR2, -3 and -5-regions on the other, belong to dis- 
tinct paralogons that were formed by chromosome duplica- 
tions during the same time period in early vertebrate 
evolution. Using relative dating in the phylogenetic ana- 
lyses, as well as the species distribution of the genes, we 
can place the duplication events to the period after the di- 
vergence of invertebrate chordates and vertebrates, but be- 
fore the divergence of lobe-finned fishes (including 
tetrapods) and ray-finned fishes (including teleosts). This 
means that the identified regions of paralogy likely resulted 
from duplications of ancestral chromosome regions in the 
same time window as the early vertebrate tetraploidizations. 
Thus, our analysis provides further support for 2R. Our 
analyses also indicate that these two paralogy regions dupli- 
cated further in the time-window of the teleost-specific 
whole genome duplication 3R, although for the SSTR gene 
family only duplicates of SSTR2, -3 and -5 were retained 
(Table 1). Based on the phylogenetic analyses of the SSTR 
family (Figure 1, Additional file 2), these duplicates have 
been named adding the letters a and b to the gene symbols. 
Our proposed evolutionary scenario for the evolution of 
the SSTR-bearing chromosome regions is presented in 
Figure 6. 
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Figure 5 Proposed somatostatin receptor evolutionary scheme. Numbers denote chromosome or linkage group assignments of SSTR genes 
in mapped genomes. Some of the SSTR genes have not been mapped to chromosomes or linkage groups, which is indicated by asterisks. 
Evolutionary scheme: Two ancestral vertebrate SSTR genes located on two different chromosomes duplicated in 2R, generating the vertebrate 
SSTR gene repertoire of SSTR 1, -4 and -6, and SSTR2, -3 and -5 respectively. SSTR6 was lost from the lobe-finned fish lineage some time after the 
divergence of the coelacanth, and SSTR4 was lost from the ray-finned fish lineage some time before the divergence of the spotted gar. Following 
chromosome fusions, the ancestral teleost SSTR2, -3 and -5 genes duplicated in 3R, while only one gene for each of SSTR 1 or -6 genes were 
conserved in some teleost lineages. Subsequent chromosome rearrangements in teleost evolution moved SSTR genes to different chromosomes. 
Data from neighboring genes families are consistent with these chromosome rearrangements. Not all SSTR subtype genes could be identified in 
some teleost genomes (Table 1). This could be either due to genuine gene losses, or perhaps due to the incomplete nature of these genome 
databases. 



The paralogous regions we have identified bearing 
SSTR1, -4, and -6-genes, and SSTR2, -3 and -5-genes, 
and the time window for their origin, are consistent with 
previous large-scale genomic analyses. In the analysis of 
paralogous chromosome regions in the human genome 
compared to the Branchiostoma floridae genome [3] 
these regions (Figure 6) correspond to ancestral chordate 
linkage groups numbered 11 and 15 respectively, indi- 
cating an origin in 2R. A separate reconstruction of the 
vertebrate ancestral genome [4] also inferred that these 
regions originated from two separate ancestral chromo- 
somes that quadrupled in 2R. In the latter analysis the 
SSTR1, -4 and -6-bearing regions correspond to the an- 
cestral linkage group called G and the SSTR2, -3 and -5- 
bearing regions to ancestral linkage group called I. The 
analysis of the first medaka draft genome [5], as well as 
the aforementioned reconstruction of the ancestral ver- 
tebrate genome, support the conclusion that both 



paralogous regions duplicated further in 3R, but that 
there have been several major rearrangements that ob- 
scure the paralogy relationships. The medaka genome is 
an appropriate starting point for the discussion of 
chromosomal rearrangements in the teleost lineage since 
it seems to have preserved more of the ancestral teleost 
genome organization [5]. 

Chromosomal rearrangements in teleost genomes 

Initially, the locations of the SSTR2a, -3a and -5a dupli- 
cates in teleost genomes suggested that the expansion of 
the somatostatin receptor family might have partially oc- 
curred through other mechanisms than 2R. In the me- 
daka and stickleback genomes all three paralogs are 
located within regions of approximately 11 Mb on 
chromosome 8 and 9 Mb on linkage group XI, respect- 
ively. In the zebrafish, SSTR2a and -3a are located ap- 
proximately 33 Mb apart on chromosome 3 while 
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Figure 6 Evolutionary scenario for the vertebrate SSTR gene-bearing chromosome regions. Two ancient vertebrate chromosomes bearing 
one SSTR gene each duplicated in 2R, generating two vertebrate paralogons; one bearing SSTR1, -4 and -6 genes (purple, pink, blue and 
turquoise blocks) and one bearing SSTR2, -3 and -5 genes (red, yellow, orange and green blocks). After the divergence of lobe-finned fishes 
(including tetrapods) and ray-finned fishes (including teleosts), three of the 2R-generated blocks fused in the ray-finned fish lineage before 3R. 
Both paralogons duplicated in 3R, followed by rearrangements between paralogous chromosome blocks, obscuring the ancestral conserved 
synteny. One of the fused, duplicated and rearranged chromosome blocks split through a fission event. The paralogous chromosome regions 
have been reconstructed for the chicken, human and medaka genomes by mapping the identified paralogous gene families. The upper color 
blocks represent ancestral chromosome regions in each lineage. Dashed boxes represent losses of chromosome blocks. Chromosome 
rearrangements involving blocks of genes are represented by arrows, while smaller translocations of genes are represented by dashed arrows. 
The full datasets are presented in Tables S4 and S5 (see Additional files 3 and 4). 



SSTRSa is located on chromosome 24. In the green puf- 
fer SSTR2a and -3a are located approximately 5 Mb 
apart on chromosome 8 and additionally the SSTR3b 
and -Sb genes are co-localized on chromosome 18 ap- 
proximately 8 Mb apart (see Table 1 for locations). 
These arrangements would suggest that ancestral seg- 
mental duplications were involved. However, in all non- 
teleost genomes, notably that of the spotted gar, the 
SSTR2, -3 and -5 paralogs are located on different chro- 
mosomes or linkage groups (Table 1). To make sure that 
our analysis of conserved synteny did not favor the 2R 
scenario over the ancestral tandem duplication scenario, 
both the chicken and the stickleback genomes were used 
as starting points for the identification of neighboring 



families in the SSTR2, -3 and -5 paralogon. For the 
SSTR1, -4 and -6 paralogon we parted from the human 
and chicken genomes, since the locations of the SSTR 
genes did not indicate different expansion scenarios in 
tetrapods and teleosts. Based on the combined pos- 
itional and phylogenetic data using tetrapods as well as 
teleosts we conclude that both of the SSTR-bearing 
paralogons have undergone a series of inter- and intra- 
chromosomal rearrangements in the teleost lineage that 
obscure the ancestral organization. To deduce these 
rearrangements we have compared lists of neighboring 
gene family members in the identified paralogous 
chromosome regions between the human, chicken, zeb- 
rafish, stickleback and medaka genomes. The results of 
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this analysis are presented in Additional files 3 and 4 
and our suggested scenario is summarized in Figure 6. 

The analysis of conserved synteny for the SSTR2, -3 
and -5 -paralogy regions shows that many of the gene 
families, not only SSTR, display the same paralog trans- 
locations between the homologous chromosome regions 
generated in 2R. Notable examples are the CYTH 
(Figure S27), FSCN (Figure S30), GGA (Figure S31), GRIN2 
(Figure S33), KCNJ (Figure S34), KCTD (Figure S35), SOX 
(Figure S44) and TNRC (Figure S46) families (see 
Additional file 6): In all the analyzed genomes these fam- 
ilies have two or three 2R-generated subtype genes located 
on the same chromosome regions with 3R-generated 
duplicates on other chromosomes (see Additional file 4). 
The GRIN2 PhyML tree is shown as an example in 
Figure 3 and the FSCN PhyML tree can be seen in 
Figure 4. 

This situation allows us to infer the scenario presented 
in Figure 6: Three of the four 2R-generated paralogous 
chromosome blocks were fused into the same chromo- 
some in the ray-finned fish lineage sometime after the 
spotted gar had branched off approximately 350 MYA 
and before 3R in the teleost lineage (for time point esti- 
mates see Amoves et al (2012) [45]). After the 3R event 
and before the last common ancestor of the studied spe- 
cies, the now duplicated fused chromosome blocks 
exchanged paralogs and subsequently one of them was 
split by fission events. In all the analyzed teleost gen- 
omes we observe these fused and rearranged regions on 
at least three chromosomes (Figures 6 and 7). There 
seem to have been more fissions and rearrangements in 
the zebrafish lineage (Figure 7). It is likely that many of 
the rearrangements occurred as part of larger blocks and 
subsequently local rearrangements have jumbled the an- 
cestral order. This scenario is corroborated by the orthol- 
ogy relationships inferred from the phylogenetic analyses 



of the neighboring families (see Additional file 6, Figures 
S21-S50). The fact that it is 2R-generated duplicates that 
have been co-located by the chromosome fusions, and 
not primarily 3R-generated duplicates, shows that the 
fusions occurred before 3R. 

We could see similar chromosomal rearrangements in 
the paralogous regions bearing SSTR1, -4 and -6 genes, 
although not to the same extent. Due to the lower de- 
gree of SSTR gene retention after 2R and 3R in this 
paralogon, fewer neighboring families could be identified 
as belonging to the paralogy block. Nonetheless some 
gene families seem to have translocated duplicates be- 
tween homologous chromosomes after 3R (Figure 6). 
The highest degree of such translocations can be seen 
in the zebrafish where the ABDH (Figure S4), FOXA 
(Figure S7), JAG (Figure S9), NIN (Figure S10), NKX2 
(Figure Sll), PAX (Figure S12), PYG (Figure S13), 
RALGAPA (Figure S14) and VSX (Figure S20) families 
have duplicates of 2R-generated subtypes located on 
the same chromosome (see Additional file 3). 

There have been some indications of these transloca- 
tions in previously published large-scale genomic ana- 
lyses. For instance, in the analysis of the published 
medaka genome [5] the rearrangements after 3R be- 
tween the SSTR2, -3 and -5-paralogous regions on chro- 
mosomes 1, 8 and 19 are apparent. Our analyses allow 
us to resolve the events in greater detail: We conclude 
that these rearrangements in the teleost lineage were 
preceded by fusions of 2R-generated chromosome blocks 
before 3R, with subsequent paralog translocations and 
chromosome fissions after 3R (Figure 6). This fusion sce- 
nario is supported by a large-scale reconstruction of the 
ancestral vertebrate genome [4], where these reorga- 
nizations were concluded from comparative genomic 
analyses including the medaka. However, these analyses 
suggested that the fusions occurred before the 
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Figure 7 Continued from Figure 6. Paralogous chromosome regions in the stickleback and zebrafish genomes. More rearrangements could be 
identified in the zebrafish genome than in the stickleback or medaka genomes. 
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divergence of lobe-finned and ray-finned fishes. Our 
conserved synteny analysis on the other hand shows that 
the tetrapod genomes have no signs of ancestral fusions 
in this paralogon (Figure 6, see Additional file 4). To- 
gether with the locations of the SSTR2, -3 and -5 genes 
on different linkage groups in the spotted gar genome 
(Table 1), our data instead point towards a time frame 
for the chromosomal fusions after the divergence of the 
gar lineage and before 3R in the teleost lineage. Both 
these large-scale analyses also support our scenario for 
the rearrangements between SSTR1, -4 and -6-paralo- 
gous regions in the teleost lineage after 3R. 

The recent mapping of the spotted gar genome [45] 
concluded that its genome organization is more similar 
to that of the human genome than to teleost genomes. 
We were able to predict sequences for all SSTR genes 
except SSTR4 in the genome of the spotted gar, and 
located them to five different genomic linkage groups 
(Table 1). The cited analyses of conserved synteny be- 
tween the spotted gar genome and the human, zebrafish 
and stickleback genomes are concurrent with our own, 
and demonstrate that the linkage groups we have identi- 
fied as SSTR-bearing in the spotted gar share conserved 
synteny with SSTR-bearing chromosome regions in the 
other genomes (see supporting information in Amores 
etal (2011) [45]). 

It is to be expected that duplicated chromosomes, as 
well as duplicated chromosomal regions that display 
similarity, can undergo rearrangements such as trans- 
location to the same chromosome. We were surprised to 
find that regions that arose as separate chromosomes in 
2R, perhaps 500 MYA, have been fused in the ray-finned 
fish lineage and subsequently exchanged 2R-generated 
paralogs after the 3R event approximately 300 MYA. 
Any such rearrangements require extensive analyses in 
order to be disentangled. We had completed our com- 
prehensive analyses arriving at the scenario shown in 
Figure 6 when the spotted gar genome became available 
and confirmed our suggested scenario. The teleost rear- 
rangements described here may severely hamper efforts 
to use conservation of synteny for identification of 
orthologs between teleosts and other vertebrates. Fortu- 
nately, the spotted gar constitutes a very important 
out-group for comparison with chromosomal events 
involving or surrounding 3R and it will greatly facili- 
tate such analyses [45]. 

Implications for synteny analyses and orthology 
assignment 

Our studies of the two vertebrate paralogons bearing 
SSTR genes have potentially far-reaching implications 
for comparative genomic studies such as analyses of 
conserved synteny. The identification of conserved syn- 
teny is essential for the correct assignment of orthology 



and paralogy relationships between genes, and therefore 
for the evolutionary studies of gene families [46]. 

We describe here how 2R-generated duplicated 
chromosome blocks fused in the ray-finned fish lineage, 
and how subsequently these fused blocks duplicated in 
3R and exchanged paralogs between each other, likely in 
blocks, blurring much of the conserved synteny patterns 
generated by the whole genome duplications. There have 
also been intra-chromosomal rearrangements within 
these chromosomal blocks, as well as a fission event 
splitting one of the 3R-duplicated blocks. At this point it 
is worth noting that the scenarios describing the evolu- 
tion of the SSTR2, -3 and -5-bearing chromosome 
regions and the SSTR1, -4 and -6-bearing regions differ, 
with the latter showing no sign of fusions after 2R and 
inter-chromosomal exchange of paralogs only after 3R 
(Figures 6 and 7). 

These types of rearrangements of the genomic struc- 
ture make it exceedingly complicated to sort out the 
evolution of genomic regions and to infer orthology and 
paralogy relationships within gene families. We show 
that it is possible to resolve these events if one considers 
both the positional data between several different gen- 
omes as well as the phylogenies of the gene families 
shared between the chromosome regions. These analyses 
also demonstrate the importance of having the appropri- 
ate out-groups to determine the (relative) time points of 
the events. We could confirm the likely ancestral paral- 
ogy relationships between the chromosome regions by 
comparing our findings against the genomes of the spot- 
ted gar and the coelacanth, which were released during 
the final stages of our analyses. The spotted gar genome, 
which has been assembled to linkage groups, proved es- 
sential to confirm the ancestral location of the SSTR 
genes on different chromosomes, and therefore to sup- 
port both the duplication of the chromosome regions in 
2R and the time window of the chromosome block 
fusions. 

The chromosomal rearrangements that we have 
described here undoubtedly complicate the assignment 
of orthology based on synteny analyses. For the SSTR 
family we found that the assignment of SSTR genes to 
specific subtypes needs to take into consideration firstly 
that there is a sixth previously undescribed ancestral ver- 
tebrate subtype, SSTR6, which is more closely related to 
SSTR1 and SSTR4; secondly, that teleost fishes may have 
additional paralogs resulting from the shared third whole 
genome duplication (3R); and thirdly that additional 
duplicates have been generated by independent fourth 
genome duplications in some lineages. The teleost 
SSTR6 sequences that we have identified in this study 
were annotated as SSTR la in the zebrafish genome data- 
base and SSTR4 in the stickleback and fugu genome 
databases. 
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Conclusions 

By combining analyses of conserved synteny with phylo- 
genetic data we can conclude that two vertebrate ances- 
tral SSTR genes on different chromosomes diversified in 
the basal vertebrate whole genome duplications, 2R, one 
giving rise to SSTR1, -4 and -6 subtype genes, and one 
giving rise to SSTR2, -3 and -5 subtype genes. The 
SSTR6 subtype was previously unrecognized, and could 
be identified in all teleost fish genomes, the spotted gar 
genome as well as the genome of the Comoran coela- 
canth. Conversely, SSTR4 subtype genes could only be 
identified in the analyzed tetrapod genes as well as the 
coelacanth. Taken together these results indicate that six 
SSTR subtype genes were ancestral to both lobe-finned 
and ray-finned fishes, but that reciprocal losses have 
occurred. Subsequently SSTR2, -3 and -5 conserved 
duplicates from the teleost-specific whole genome du- 
plication, 3R. Although there have been losses of SSTR 
subtype genes, the paralogous genome regions could 
be identified in both tetrapod and teleost genomes. 
The positional and phylogenetic data from the analysis 
of conserved synteny indicate that there have been sig- 
nificant rearrangements between paralogous chromo- 
some regions in the teleost genomes, especially between 
SSTR2, -3 and -5-bearing chromosome regions. These 
rearrangements would explain the co-localization of 
SSTR2, -3 and -5 genes in several teleost genomes. That 
these rearrangements occurred in the teleost lineage is 
corroborated by comparison with the spotted gar gen- 
ome, representing a lineage that diverged before teleost 
evolution. 

Methods 

Identification of SSTR sequences in Ensembl genome 
databases 

Amino acid sequences of SSTR family members were 
identified in the Ensembl genome browser (http:// 
www.ensembl.org) using the automatic protein family 
prediction feature. All SSTR sequences and their loca- 
tions have been verified against Ensembl release 67 
(May 2012). In most analyzed genomes the identified 
"somatostatin receptor type" protein family included 
somatostatin receptors of SSTR1, -2, -3, -4 and -5- type 
as well as neuropeptide B/W receptors of NPBWR1 
and -2-type. The NPBWRs share sequence similarity to 
both SSTRs and opioid receptors, however phylogen- 
etic analysis as well as their chromosomal locations in- 
dicate that they constitute a separate family of GPCRs 
[9]. Hence only the SSTR sequences were considered 
in our analyses. 

The SSTR sequences from the following Ensembl gen- 
ome databases were collected and their database identifiers 
and locations noted: Homo sapiens (human), Mus muscu- 
lus (mouse), Canis familiaris (dog), Monodelphis domestica 



(grey short-tailed opossum), Gallus gallus (chicken), Anolis 
carolinensis (Carolina anole lizard), Silurana (Xenopus) tro- 
picalis (Western clawed frog), Latimeria chalumnae (Com- 
oran coelacanth), Danio rerio (zebrafish), Oryzias latipes 
(medaka), Gasterosteus aculeatus (three-spined stickle- 
back), Tetraodon nigroviridis (green spotted pufferfish), 
Takifugu rubripes (fugu), Ciona intestinalis (vase tunicate) 
and Drosophila melanogaster (fruit fly). Database identi- 
fiers, location data and annotation notes of all SSTR 
sequences, as well as genome assembly versions for each 
species, are listed in Additional file 7. 

To account for possible failures in the automatic iden- 
tification of SSTR protein family members TBLASTN 
searches were also carried out in the Ensembl databases 
as well as in the National Center for Biotechnology In- 
formation (NCBI) Reference Sequence and trace archive 
databases using the known human SSTR sequences as 
queries. Branchiostoma floridae (Florida lancelet, amphi- 
oxus) genomic scaffolds were sought by TBLASTN 
searches in the NCBI Reference Sequence database using 
the known human family member sequences as queries. 
Additionally, complementary searches for teleost fish 
sequences were performed in the NCBI reference se- 
quence database using the identified zebrafish SSTR1 
and SSTR6 sequences. 

Identification of SSTR sequences in the Lepisosteus 
oculatus genome 

SSTR sequences were sought in the Lepisosteus ocu- 
latus (spotted gar) genome assembly LepOcul (Gen- 
Bank ID: GCA_000242695.1) available from NCBI 
(http://www.ncbi.nlm.nih.gov/genome/assembly/ 
327908/). The sequences of the assembled linkage 
groups as well as unplaced scaffolds were down- 
loaded and a local search database was set up. 
TBLASTN searches were carried out in this local 
database applying the BLAST+ 2.2.26 executable ap- 
plication available from ftp://ftp.ncbi.nlm.nih.gov/ 
blast/executables/blast+/LATEST/ with the known 
human SSTR sequences as well as the identified 
coelacanth sequences as search queries. The near 
full-length BLAST hits were evaluated by reciprocal 
tblastn searches in the NCBI reference sequence 
database and those that matched identified SSTR 
sequences were included in preliminary neighbor 
joining (NJ) trees (see "Phylogenetic analyses 1 below) 
to assert their identities. The positions within the 
linkage groups of those BLAST hits that clustered 
confidently within the SSTR NJ tree were noted and 
the corresponding genomic sequences were inspected 
in order to predict the full length of the SSTR genes 
in the spotted gar (see "Sequence alignments and 
editing of gene and protein sequences' below). 
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Identification and analysis of neighboring gene families/ 
Conserved synteny analysis 

Lists of gene predictions corresponding to the different 
SSTR-bearing chromosome blocks were downloaded 
using the BioMart function in the Ensembl database ver- 
sion 56. The chromosome blocks were defined as 15 Mb 
in each direction of the SSTR gene in question, although 
in many cases this definition encompassed the entire 
chromosome. These blocks were compared with each 
other in order to identify those gene families, as defined 
by Ensembls automatic protein family prediction, that 
are represented on several of the blocks across different 
species. 

For the analysis of the SSTR1, -4 and -6-bearing 
regions, the human and chicken chromosome blocks 
carrying the SSTR1 and SSTR4 genes were compared 
with each other. The gene families that were represented 
on both chromosomes in the human genome were 
selected for the analysis of conserved synteny and this 
list was complemented with those gene families that 
were represented on both chicken chromosomes as well 
as at least one of the human chromosomes. In this way 
we could account for any possible lineage-specific rear- 
rangements in any of these genomes. The chromosome 
blocks in the human genome (assembly GRCh36) were 
between map positions 23 Mb and 53 Mb on chromo- 
some 14 and between 8 Mb and 38 Mb on chromosome 
20. The chromosome blocks in the chicken genome (as- 
sembly WASHUC2) were between map positions 24 Mb 
and 54 Mb on chromosome 5 and between 1 bp and 
18 Mb on chromosome 3. These blocks represent the 
chromosome regions bearing SSTR1 and SSTR4 genes 
respectively in each species. Teleost genomes were not 
considered in this selection of neighboring gene fam- 
ilies since there seems to have been a lineage-specific 
loss of SSTR4 early in ray-finned fish evolution. Our 
preliminary phylogenetic analysis indicated that tele- 
osts, spotted gar and coelacanth had another distinct 
SSTR gene instead, SSTR6, which we could take ad- 
vantage of in the analysis of conserved synteny. This 
gene has not been assigned to a chromosome location 
except for in the zebrafish genome. We attempted in- 
cluding the chromosome regions of zebrafish SSTR1 
and -6 in the selection of neighboring gene families, 
but this provided no additional ones. 

For the analysis of the SSTR2, -3 and -5-bearing 
regions, the chicken and stickleback chromosome 
blocks were both used. The gene families that were 
represented on all three chromosomes in each of these 
genomes were chosen for the analysis of conserved 
synteny. The chromosome blocks in the chicken gen- 
ome (assembly WASHUC2) were between map posi- 
tions 38 Mb and 69 Mb on chromosome 1, as well as 
the whole of chromosomes 14 (approximately 15.8 Mb) 



and 18 (approximately 10.9 Mb). The blocks in the 
stickleback genome (assembly BROADS 1) correspond 
to the full linkage groups V (approximately 12.25 Mb), 
IX (approximately 20.24 Mb) and XI (approximately 
16.20 Mb). Linkage groups V and IX carry the SSTR2b 
and SSTR5b genes respectively, and linkage group XI 
carries three SSTR genes: SSTR2a, SSTRSa and SSTR3. 
The stickleback genome was favored over other teleost 
genomes as all the SSTR genes predicted in this gen- 
ome assembly have been mapped. 

The predicted amino acid sequences of all the identi- 
fied protein family members were downloaded for sub- 
sequent alignment and phylogenetic analysis, and the 
locations of the corresponding predicted genes were 
noted (see Additional files 8 and 9). Locations have 
been verified against Ensembl version 67 (May 2012) 
to ensure the information is up to date. To a large ex- 
tent the same species were included in the phylogen- 
etic analyses of the neighboring gene families as in the 
SSTR tree, with the following exceptions: coelacanth 
and spotted gar sequences were not considered and 
green spotted pufferfish and/or fugu sequences were 
only included when the preliminary phylogenetic ana- 
lyses showed inconclusive teleost fish topologies. Add- 
itionally, sequences from the Macropus eugenii (tammar 
wallaby, assembly 1.0), Taeniopygia guttata (zebra finch, 
assembly 3.2.4), Meleagris gallopavo (turkey, assembly 
2.01), Ciona savignyii (transparent tunicate, assembly 
2.0) and Branchiostoma floridae (Florida lancelet, amphi- 
oxus, assembly 2.0) genome databases were used to com- 
plement missing gene predictions in the genome 
databases for grey short-tailed opossum, chicken and 
vase tunicate for some gene families. For those few gene 
families where no fruit fly, amphioxus or tunicate 
sequences could be identified, the Caenorhabditis elegans 
predicted protein family members were collected from 
the Ensembl database (assembly WBcel215). Database 
identifiers, location data and annotation notes of all 
neighboring family sequences are included in supple- 
mental tables (see Additional files 8 and 9). 

Sequence alignments and editing of gene and protein 
sequences 

The identified amino acid sequences were aligned using 
the ClustalWS sequence alignment program with stand- 
ard settings (Gonnet weight matrix, gap opening penalty 
10.0 and gap extension penalty 0.20) through the 
JABAWS 2 tool in Jalview 2.7 [47]. The alignments were 
manually inspected and edited in Jalview 2.7 in order to 
curate wrongly predicted sequences and adjust poorly 
aligned sequence stretches. Short, incomplete or highly 
diverging amino acid sequence predictions were curated 
manually by analyzing the corresponding genomic 
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sequence (including full intron sequences and flanking 
regions) with respect to consensus for splice donor and 
acceptor sites and sequence homology to other family 
members. In this way erroneous automatic exon predic- 
tions and exons that had not been predicted could be 
ratified. 



Phylogenetic analyses 

Phylogenetic trees were made using the Phylogenetic 
Maximum Likelihood (PhyML) method [48] supported 
by a non-parametric bootstrap analysis of 100 replicates 
and assuming the LG matrix of amino acid substitution 
by Le and Gascuel [49]. This method was applied using 
the web-application of the PhyML 3.0 algorithm avail- 
able at http://www.atgc-montpellier.fr/phyml/ or the 
executable PhyML-aBayes (3.0.1 beta) program with the 
following settings: amino acid frequencies (equilibrium 
frequencies), proportion of invariable sites (with opti- 
mised p-invar) and gamma-shape parameters were esti- 
mated from the datasets; the number of substitution 
rate categories was set to 8; BIONJ was chosen to cre- 
ate the starting tree and the nearest neighbor inter- 
change (NNI) tree improvement method was used to 
estimate the best topology; both tree topology and 
branch length optimization were chosen. 

Initially, phylogenetic trees were made using the 
neighbor joining (NJ) method applied through ClustalX 
2.0 [50] with standard settings and a non-parametric 
bootstrap analysis with 1000 replicates. These trees have 
been included for the neighboring gene families in 
Additional files 5 and 6 in order to complement the 
PhyML tree topologies and provide a reference for dis- 
cussion in the cases where tree topologies were inconclu- 
sive (see Results). For both NJ and PhyML tree 
topologies, bootstrap values higher than 50% were con- 
sidered supportive. 

For the SSTR-family tree (Figure 1) more careful mea- 
sures were taken in order to account for the larger 
amount of protein subtypes and animal taxa in this tree 
compared to the neighboring gene families. The Phylo- 
genetic Maximum Likelihood analysis was repeated 
using both a non-parametric bootstrap analysis of 100 
replicates, and an SH-like approximate likelihood ratio 
test (aLRT) [51], in both cases selecting both NNI and 
subtree pruning and regrafting (SPR) tree improvement 
methods rather than only NNI. Additionally, the amino 
acid substitution model for the phylogenetic analysis 
was chosen using ProtTest 3.0 [52] with the following 
settings: Likelihood scores were computed selecting 
the JTT, LG, DCMut, Dayhoff, WAG, Blosum62 and 
VT substitution model matrices, with no add-ons and 
a Fixed BioNJ JTT base tree. Based on this analysis 
the JTT model of amino acid substitution was chosen. 



In most cases the identified fruit fly sequences were 
used as out-groups to root the trees, and where such a 
sequence could not be found the identified amphioxus 
or tunicate sequences were used as the out-group in- 
stead. The inclusion of amphioxus and/or tunicate in the 
phylogenetic analyses provides the relative dating for the 
time window of the 2R events. For two gene families C. 
elegans sequences had to be identified due to the lack of 
fruit fly sequences. For the SSTR-family tree (Figure 1) 
the human kisspeptin receptor (KISS1R or GPRS4) se- 
quence was chosen as an out-group in order to accur- 
ately show the branching point of the identified fruit fly 
SSTR-family genes. Kisspeptin receptors are GPCRs 
closely related to the somatostatin receptors [53] (see also 
Additional file 5; Figure S15 in Nordstrom et al (2008) 
[33]), diverging before the protostome-deuterostome split, 
therefore providing a reasonable out-group for our phylo- 
genetic analysis of the SSTR family. 

Description of additional files 

The following additional data files are available with the 
online version of this paper. The spreadsheets in 
Additional files 7, 8 and 9 include comprehensive infor- 
mation about all sequences analyzed in this study, such 
as database identifiers, location data and annotation 
notes. Figures of all phylogenetic analyses used in the 
study are included in Additional files 2,5,6. The pos- 
itional data underlying our evolutionary scenario is pre- 
sented in Additional files 3 and 4. All final curated 
sequence alignments made for the phylogenetic analyses, 
as well as the original rooted phylogenetic tree files, have 
been provided as citable file sets with persistent identi- 
fiers - see references [54,55]. Detailed notes on the iden- 
tification of SSTR sequences in the genome databases, 
as well as detailed descriptions of the neighboring family 
tree topologies, are included in Additional file 1. 

Additional files 



Additional file 1: Supplemental notes 1-3. Detailed descriptions of 
the results, including the identification of SSTR sequences in genome 
databases as well as the phylogenetic analyses of neighboring gene 
families. 

Additional file 2: Figures SI -S3. All phylogenetic analyses of the SSTR 
gene family. 

Additional file 3: Table S4. Positional data for the 5577? 7, -4 and -6- 
bearing chromosome regions. The members of the identified 
neighboring gene families in these chromosome regions are charted by 
species and chromosome/genomic scaffold. These charts show the 
identified paralogous chromosome regions in the human, chicken, 
medaka, stickleback and zebrafish genomes. Each species is included in a 
separate tab in the spreadsheet. 

Additional file 4: Table S5. Positional data for the SSTR2, -3 and -5- 
bearing chromosome regions. The members of the identified 
neighboring gene families in these chromosome regions are charted by 
species and chromosome/genomic scaffold. These charts show the 
identified paralogous chromosome regions in the human, chicken, 
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medaka, stickleback and zebrafish genomes. Each species is included in a 
separate tab in the spreadsheet. 

Additional file 5: Figures S4-S20. Phylogenetic trees of the SSTR1, -4 
and -6-neighboring gene families. Figures are numbered S4-S20 and 
include both neighbor joining and phylogenetic maximum likelihood 
trees of the gene families described in Table 2. 

Additional file 6: Figures S21-S50. Phylogenetic trees of the SSTR2, -3 
and -5-neighboring gene families. Figures are numbered S21-S50 and 
include both neighbor joining and phylogenetic maximum likelihood 
trees of the gene families described in Table 3. 

Additional file 7: Table SI. Database identifiers, location data and 
annotation notes of all SSTR sequences identified and included in this 
study. 

Additional file 8: Table S2. Database identifiers, location data and 
annotation notes of SSTR I -4 and -6-neighboring gene family sequences, 
including information for those gene families that were discarded from 
the analysis. Each gene family is included in a separate tab in the 
spreadsheet. 

Additional file 9: Table S3. Database identifiers, location data and 
annotation notes of SSTR2, -3 and -5-neighboring gene family sequences, 
including information for those gene families that were discarded from 
the analysis. Each gene family is included in a separate tab in the 
spreadsheet. 
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